Phylogenetic Motif Detection by Expectation-Maximization on Evolutionary Mixtures

نویسندگان

  • Alan M. Moses
  • Derek Y. Chiang
  • Michael B. Eisen
چکیده

The preferential conservation of transcription factor binding sites implies that non-coding sequence data from related species will prove a powerful asset to motif discovery. We present a unified probabilistic framework for motif discovery that incorporates evolutionary information. We treat aligned DNA sequence as a mixture of evolutionary models, for motif and background, and, following the example of the MEME program, provide an algorithm to estimate the parameters by Expectation-Maximization. We examine a variety of evolutionary models and show that our approach can take advantage of phylogenic information to avoid false positives and discover motifs upstream of groups of characterized target genes. We compare our method to traditional motif finding on only conserved regions. An implementation will be made available at http://rana.lbl.gov.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Mclip: motif detection based on cliques of gapped local profile-to-profile alignments

UNLABELLED A multitude of motif-finding tools have been published, which can generally be assigned to one of three classes: expectation-maximization, Gibbs-sampling or enumeration. Irrespective of this grouping, most motif detection tools only take into account similarities across ungapped sequence regions, possibly causing short motifs located peripherally and in varying distance to a 'core' m...

متن کامل

Sufficient statistics and expectation maximization algorithms in phylogenetic tree models

MOTIVATION Measuring evolutionary conservation is a routine step in the identification of functional elements in genome sequences. Although a number of studies have proposed methods that use the continuous time Markov models (CTMMs) to find evolutionarily constrained elements, their probabilistic structures have been less frequently investigated. RESULTS In this article, we investigate a suff...

متن کامل

Mixture Model based MAP Motif Discovering

In this paper a new maximum a posteriori (MAP) approach based on mixtures of multinomials is proposed for discovering probabilistic motifs in sequences. The main advantage of the proposed methodology is the ability to bypass the problem of overlapping motif occurrences among neighborhood positions in sequences through the use of a Markov Random Field (MRF) as a prior. This model consists of two...

متن کامل

MotifHyades: expectation maximization for de novo DNA motif pair discovery on paired sequences

Motivation In higher eukaryotes, protein-DNA binding interactions are the central activities in gene regulation. In particular, DNA motifs such as transcription factor binding sites are the key components in gene transcription. Harnessing the recently available chromatin interaction data, computational methods are desired for identifying the coupling DNA motif pairs enriched on long-range chrom...

متن کامل

Expectation Maximization for Combined Phylogenetic and Hidden Markov Models

An expectation maximization (EM) algorithm is derived to estimate the parameters of a phylogenetic model, a probabilistic model of molecular evolution that considers the phylogeny, or evolutionary tree, by which a set of present-day organisms are related. The EM algorithm is then extended for use with a combined phylogenetic and hidden Markov model. An efficient method is also shown for computi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing

دوره   شماره 

صفحات  -

تاریخ انتشار 2004